Add clean markdown generation for LLM-friendly page content#723
Open
nearestnabors wants to merge 3 commits intomainfrom
Open
Add clean markdown generation for LLM-friendly page content#723nearestnabors wants to merge 3 commits intomainfrom
nearestnabors wants to merge 3 commits intomainfrom
Conversation
- Generate clean markdown from rendered HTML pages during build - Update /api/markdown endpoint to serve pre-generated clean markdown - Add CopyPageOverride component to fetch clean markdown on "Copy page" - Add frontmatter (title, description) extracted from HTML meta tags - Fix linting issues with top-level regex and simplified logic Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
evantahler
reviewed
Feb 4, 2026
Contributor
evantahler
left a comment
There was a problem hiding this comment.
Looks like the test has been running for a few hours... there might be something preventing the processs from exiting (a dangling prommise?)
I'd also love to see a test for one of of the clean markdown files that initially had some HTML that was removed successfully
- Add explicit process.exit(0) to ensure script terminates after completion (event listeners on spawned server kept event loop alive) - Add validation test for HTML element removal (script, style, svg, nav, footer, aside) - Refactor validateGeneratedContent into smaller helper functions to reduce complexity - Increase MIN_INTEGRATION_LINKS threshold from 5 to 10 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Currently our public markdown files are littered with janky JSX components that LLMs can't read and link to HTML files instead of other markdown files. This PR FIXES that.
Summary
/api/markdownendpoint to serve pre-generated clean markdown (falls back to real-time conversion)CopyPageOverridecomponent that intercepts the "Copy page" button to fetch clean markdown from the API<meta>tags to generated markdown filespublic/_markdown/to.gitignoreChanges
scripts/generate-clean-markdown.ts: New script that runs a production server, fetches rendered HTML, and converts to clean markdown using Turndownapp/api/markdown/[[...slug]]/route.ts: Updated to serve pre-generated markdown first, with fallbackapp/_components/copy-page-override.tsx: Client component that intercepts copy button clicksapp/_components/custom-layout.tsx: Includes the CopyPageOverride componentscripts/generate-llmstxt.ts: Uses pre-generated clean markdown if availablepackage.json: Added build scripts for generating clean markdownTest plan
pnpm buildto verify the build succeedspnpm generate:clean-markdownto verify markdown generation works/api/markdown/en/home.mdreturns markdown with frontmatter🤖 Generated with Claude Code